Quick links (more details at introduction):
Assignment used the base docker image acessible here.
Chosen study: Serotonin-induced hyperactivity in SSRI-resistant major depressive disorder patient-derived neurons(Vadodaria et al., 2019).
GEO data set: GEO accession GSE125664(Vadodaria KC, 2019)
Previously, in A1, we have found an appropriate study (in terms of RNA expression analysis and clinical relevance) by searching RNAseq data set available at [Gene Expression Omnibus repository - GEO] (www.ncbi.nlm.nih.gov/geo)(“Gene Expression Omnibus,” n.d.).
We used Geometadb(Zhu et al., 2008) to perform the search of GEO data sets and RSQLite(Müller et al., 2020) and GEOquerry(Davis & Meltzer, 2007) package to perform the queries using a downloaded copy of GEO contents’ metadata. These packages were installed, when needed through BiocManager(Morgan, 2019) functionality. knitr(Xie, 2021) package was later used to report data statistics and edgeR to aid on modeling and normalization of data.
After 4 search iterations we have chosen the following study: [Serotonin-induced hyperactivity in SSRI-resistant major depressive disorder patient-derived neurons] (https://pubmed.ncbi.nlm.nih.gov/30700803/)(Vadodaria et al., 2019) in which the authors tried to elucidate the common clinical challenge of patients suffering from Major Depressive Disorder (MDD) who fail or respond only partially to SSRI therapy (aka non-remitters). We currently have no biomarkers to identify these patients. Moreover, is not known the specific mechanism underpinning SSRI and similar SNRI-induced suicidality.
It also provided the raw mRNA counts for analysis. The controls are represented by expression data in forebrain neuron lines developed from Induced pluripotent stem cells (iPSCs) removed through non-CNS biopsy of healthy patients exposed to SSRIs. The test group involve the same cells from MDD patients which suffered from SSRI treatment resistance and MDD patients which responded to SSRI treatment.
The sample size in this study is 9, with adequate control and 2 testing groups (MDD with an without SSRI-based treatment resistance).
The data for this study is contained in one supplemental .csv file called “GSE125664_Vadodaria_MDDNeurons_RawCounts”:
Platform title: Illumina HiSeq 2500 (Homo sapiens)
Submission date: Mar 14 2013
Last update date: Mar 27 2019
Organisms: Homo sapiens
Snapshot of data first 10 mapped genes at this stage:
kable(SSRI_exp[1:10,1:10], format = "html")
| gname | H_neurons_1 | H_neurons_2 | H_neurons_3 | NR_neurons_1 | NR_neurons_2 | NR_neurons_3 | R_neurons_1 | R_neurons_2 | R_neurons_3 |
|---|---|---|---|---|---|---|---|---|---|
| A1BG | 220 | 219 | 205 | 211 | 202 | 134 | 270 | 163 | 220 |
| A1BG-AS1 | 106 | 98 | 96 | 91 | 165 | 128 | 104 | 130 | 120 |
| A1CF | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| A2M | 9441 | 18 | 4 | 2541 | 336 | 488 | 2396 | 1584 | 72 |
| A2M-AS1 | 9 | 12 | 9 | 22 | 36 | 5 | 26 | 31 | 7 |
| A2ML1 | 11 | 0 | 1 | 6 | 53 | 23 | 21 | 10 | 1 |
| A2MP1 | 4 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| A3GALT2 | 4 | 1 | 1 | 3 | 10 | 7 | 5 | 6 | 2 |
| A4GALT | 85 | 48 | 78 | 29 | 1 | 0 | 17 | 0 | 43 |
| A4GNT | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
dim(SSRI_exp)
## [1] 22351 10
And we have 22351 genes with measures for 3 controls, 3 non-resistant-to-SSRI MDD patients and 3 resistant-to-SSRI MDD patients.
Counts for each gene show each gene with the frequency of exactly one, except for some miscoded names.
28 genes contained miscoded names (they expressed dates instead of valid identifiers) and we will have to remove them.
Each mRNA reading was already mapped to unique genes expressed as HUGO Gene Nomenclature Committee symbols) so we proceeded to calculate counts per million reads mapped (cmp). All the counts with less than 9 counts where removed and that removed 9350 gene observations. There after that, and after the cleaning the data, there were no duplicates and all the symbols where unique symbols that complied with HUGO Gene Nomenclature Committee. After all removals and manipulations, the dataset coverage was 0.59.
We have kept 12973 gene counts, meaning we had to filter out 0.4188505 of our gene-mapped RNA seq observations
We used TMM (Trimmed Mean of M-values) normalization. For most samples the distribution curve pre and post-normalization is very similar, however there is a noticiable change towards the mean (a bit higher than 5) for the second control (H_neuron_2) and, interestingly, for 2 of the 3 nonremitter patients.
data2plot <- log2(cpm(SSRI_exp_filtered[,2:10]))
SSRI_matrix <- as.matrix(SSRI_exp_filtered[,-1])
rownames(SSRI_matrix) <- SSRI_exp_filtered$gname
# inspired by Yi Fei Huang approach in terms of merging samples into groups
categories <- c("healthy", "healthy", "healthy",
"nonremitter", "nonremitter", "nonremitter",
"remitter", "remitter", "remitter")
# inspired by lecture 4 slides 51 and 52
d = DGEList(counts=SSRI_matrix, group=categories)
# Calculation of normalization factors
d = calcNormFactors(d)
normalized_counts <- cpm(d)
In this section, we will used the normalized data produced above to test for statistically significant expression differences for each of the mapped genes, between our 3 groups (we will call it status) of interest in our designed model: healthy patients exposed to SSRI (control), non-remitters MDD exposed to SSRI (test 2) and remitter MDD exposed to SSRI (test 2).
An expansion of this model includes 2 groups (we will call this attribute “affected”), healthy (ND) and affected by disease (D), although this expansion wouldn’t add granularity to the assessment of different subtypes of MDD (refractive or not to treatment), it could help in increasing specificity to the bayes differential expression significance calculation.
# First we create our model
samples <- data.frame(status = c("H", "H", "H", "NR", "NR","NR", "R", "R", "R"),
affected = c("ND", "ND", "ND", "D", "D","D", "D", "D", "D"))
rownames(samples) <- colnames(SSRI_exp[-1])
# model on affected Vs healthy
model_design <- model.matrix(~samples$affected)
# model on status of remission to SSRI (remitters Vs non-remitters)
model_design_2 <- model.matrix(~samples$status)
# compound model taking into account both conditions will not yield independent coefficients:
# model_design_compound <- model.matrix(~samples$affected + samples$status)
Now we will fit our normalized data to each of the built models.
# As inspired by lecture 6, using limma, doesn't add result in significantly different
# restults compared with chosen approach (see following code chunk):
expressionMatrix <- as.matrix(normalized_counts)
rownames(expressionMatrix) <- SSRI_exp_filtered$gname
colnames(expressionMatrix) <- colnames(SSRI_exp_filtered)[2:10]
minimalSet <- ExpressionSet(assayData=expressionMatrix)
# Fitting for Affected Vs Controls
fit <- lmFit(minimalSet, model_design)
# Fitting for type of SSRI response (remitters and non-remitters)
fit_resp <- lmFit(minimalSet, model_design_2)
Using empirical Bayes method:
fit_2 <- eBayes(fit_resp,trend=TRUE)
fit_resp_2 <- eBayes(fit_resp,trend=TRUE)
Now we can calculate the differential expression and apply the correction for multiple hypothesis test through Benjamni-Hochberg method (less than 1 minute processing time).
#calculate differential expression, inspired by lecture 6
topfit <- topTable(fit_2, coef=ncol(model_design), adjust.method = "BH",
number = nrow(expressionMatrix))
topfit_resp <- topTable(fit_resp_2, coef=ncol(model_design_2), adjust.method = "BH",
number = nrow(expressionMatrix))
gene_names <- data.frame(SSRI_exp[,1])
colnames(gene_names) <- c("hgnc_symbol")
output_hits <- merge(gene_names, topfit, by.y=0, by.x=1, all.y=TRUE)
output_hits_2 <- merge(gene_names, topfit_resp, by.y=0, by.x=1, all.y=TRUE)
#sort by p-value
output_hits <- output_hits[order(output_hits$P.Value),]
output_hits_2 <- output_hits_2[order(output_hits_2$P.Value),]
And the results for each of the models, with and without Benjamni-Hochberg corrections for multiple hypothesis tests are as following.
For Healthy Vs MDD patients:
kable(output_hits[1:10,],type="html",row.names = FALSE)
| hgnc_symbol | logFC | AveExpr | t | P.Value | adj.P.Val | B |
|---|---|---|---|---|---|---|
| ARHGAP20 | 31.790074 | 13.690482 | 9.103935 | 0.0000085 | 0.0762015 | -4.086061 |
| FUT9 | 297.971376 | 143.629692 | 8.744316 | 0.0000117 | 0.0762015 | -4.090963 |
| TGFB2 | 144.836367 | 69.823697 | 8.082447 | 0.0000220 | 0.0952138 | -4.101465 |
| PCDH17 | 161.793691 | 95.354410 | 7.666083 | 0.0000334 | 0.1082782 | -4.109250 |
| CPLX1 | 17.053081 | 11.094874 | 7.232782 | 0.0000525 | 0.1236884 | -4.118530 |
| NXPH4 | 18.140995 | 17.926544 | 7.151887 | 0.0000572 | 0.1236884 | -4.120412 |
| PCDHA12 | 9.388553 | 8.427546 | 6.575008 | 0.0001084 | 0.1769541 | -4.135432 |
| MDK | -268.541560 | 209.302062 | -6.352621 | 0.0001401 | 0.1769541 | -4.142065 |
| TRIM9 | 190.655235 | 140.332625 | 6.250673 | 0.0001580 | 0.1769541 | -4.145284 |
| CENPW | -25.163268 | 15.527767 | -6.182821 | 0.0001712 | 0.1769541 | -4.147491 |
For Remitter status:
kable(output_hits_2[1:10,],type="html",row.names = FALSE)
| hgnc_symbol | logFC | AveExpr | t | P.Value | adj.P.Val | B |
|---|---|---|---|---|---|---|
| FAM222A | 16.581813 | 12.503918 | 6.505383 | 0.0001174 | 0.6138379 | -4.485416 |
| NHSL2 | 5.669770 | 4.068259 | 6.123237 | 0.0001838 | 0.6138379 | -4.488182 |
| CENPW | -21.845698 | 15.527767 | -5.367667 | 0.0004719 | 0.6138379 | -4.494934 |
| TROAP | -14.185957 | 11.867062 | -5.170270 | 0.0006116 | 0.6138379 | -4.497039 |
| AUTS2 | 189.589420 | 242.091556 | 5.079170 | 0.0006906 | 0.6138379 | -4.498066 |
| SHROOM2 | 53.631027 | 45.223557 | 5.009858 | 0.0007581 | 0.6138379 | -4.498871 |
| KIFC1 | -27.544679 | 25.241187 | -4.950808 | 0.0008213 | 0.6138379 | -4.499575 |
| E2F8 | -6.080405 | 4.980789 | -4.922310 | 0.0008537 | 0.6138379 | -4.499920 |
| ATP1A3 | 354.375848 | 301.299852 | 4.910043 | 0.0008682 | 0.6138379 | -4.500070 |
| NXPH4 | 11.796362 | 17.926544 | 4.650586 | 0.0012431 | 0.6138379 | -4.503414 |
Applying a threshold of p-value < 0.05 and computing how many genes passed the test:
length(which(output_hits$P.Value < 0.05))
## [1] 2612
length(which(output_hits_2$P.Value < 0.05))
## [1] 983
Now computing how many mapped genes passed the p-value < 0.05 with Benjamni-Hochberg correction for multiple testing
length(which(output_hits$adj.P.Val < 0.05))
## [1] 0
length(which(output_hits_2$adj.P.Val < 0.05))
## [1] 0
In this approach we will use the model that differentiates between MDD patients that were non-responsive to SSRI (non-remitter or NR) from responsive patients (remitter or R), since this is main approach took by the paper’s authors in its published version’s transcriptome analysis (see (Vadodaria et al., 2019), page 801, specially fig 4.b)
d = DGEList(counts=normalized_counts, group=samples$statusNR)
# Dispersions
d <- estimateDisp(d, model_design_2)
# Computing normalization factors
d <- calcNormFactors(d)
# fitting models
fit <- glmQLFit(d, model_design_2)
# Computing differential expression
qlf.pos_vs_neg <- glmQLFTest(fit, coef='samples$statusNR')
# Results
qlf_output_hits <- topTags(qlf.pos_vs_neg, sort.by = "PValue", n = nrow(SSRI_exp))
# Passing p-value < 0.05?
length(which(qlf_output_hits$table$PValue < 0.05))
## [1] 2536
# Passing correction
length(which(qlf_output_hits$table$FDR < 0.05))
## [1] 0
## [1] 0
Since the previous 2 modeling failed to yield significant results post-correction, we decided to use the model applied by the source study (Vadodaria et al., 2019), page 801. Thus, we design 2 distinct but symmetrical models that will assess remitters MDD patient’s cells VS Healthy control cells and remitters MDD patient’s cells VS non-remitters MDD patient’s cells separately.
samples <- data.frame(status = c(“H”, “H”, “H”, “NR”, “NR”,“NR”, “R”, “R”, “R”), affected = c(“ND”, “ND”, “ND”, “D”, “D”,“D”, “D”, “D”, “D”))
# Creating model for non-remitters VS control assessment
samples_NRxH <- data.frame(status = c("H", "H", "H", "NR", "NR","NR"))
rownames(samples_NRxH) <- colnames(SSRI_exp)[2:7]
model_design_NRxH <- model.matrix(~samples_NRxH$status)
# Creating model for non-remitters VS remitters assessment
samples_NRxR <- data.frame(status = c("NR", "NR","NR", "R", "R", "R"))
rownames(samples_NRxR) <- colnames(SSRI_exp)[5:10]
model_design_NRxR <- model.matrix(~samples_NRxR$status)
Now we apply the each model separately to the appropriate data set:
d_NRxH = DGEList(counts=normalized_counts[,1:6], group=samples_NRxH$statusNR)
d_NRxR = DGEList(counts=normalized_counts[,4:9], group=samples_NRxR$statusNR)
# Dispersions
d_NRxH <- estimateDisp(d_NRxH, model_design_NRxH)
d_NRxR <- estimateDisp(d_NRxR, model_design_NRxR)
# Computing normalization factors
d_NRxH <- calcNormFactors(d_NRxH)
d_NRxR <- calcNormFactors(d_NRxR)
# fitting models
fit_NRxH <- glmQLFit(d_NRxH, model_design_NRxH)
fit_NRxR <- glmQLFit(d_NRxR, model_design_NRxR)
# Computing differential expression
qlf.pos_vs_neg_NRxH <- glmQLFTest(fit_NRxH, coef='samples_NRxH$statusNR')
qlf.pos_vs_neg_NRxR <- glmQLFTest(fit_NRxR, coef='samples_NRxR$statusR')
# Results, now inspired by lecture 7
qlf_output_hits_RNxH <- topTags(qlf.pos_vs_neg_NRxH, sort.by = "PValue", n = nrow(SSRI_exp))
qlf_output_hits_RNxR <- topTags(qlf.pos_vs_neg_NRxR, sort.by = "PValue", n = nrow(SSRI_exp))
# Here we check how many mapped genes passed the 0.05 significance threshold with and without correction
# for the comparison remitter Vs control
length(which(qlf_output_hits_RNxH$table$PValue < 0.05))
## [1] 2306
length(which(qlf_output_hits_RNxH$table$FDR < 0.05))
## [1] 0
# Here we check how many mapped genes passed the 0.05 significance threshold with and without correction
# for the comparison non-remitters Vs remitters
length(which(qlf_output_hits_RNxR$table$PValue < 0.05))
## [1] 732
length(which(qlf_output_hits_RNxR$table$FDR < 0.05))
## [1] 0
Although the results indicate the need for enrichment set analysis (which is one of the purpose of this assignment, as follows), this results are now more comparable with the source literature(Vadodaria et al., 2019), in which, at least for the set of genes presented there, the differential expression between remitters and controls is larger than the one between remitters and non-remitters.
Another thresholding, now with p=0.09 is now performed, despite not being considered as a strong evidence of up or downregulation. We decided to perform this aditional less stringent comparision in order to compare with the original paper’s finding of HTR set (specifically HTR2A) are up-regulated in non-remitters compared to remmiters.
length(which(qlf_output_hits_RNxR$table$PValue < 0.09))
## [1] 1275
And it increases drastically the amount of hits, compatible with what the paper presented, although we won’t proceed further with the 0.09 threshold in order to keep relevance to any eventual finding.
Our top results through the last model were as follows, for the remitters VS control comparison:
kable(topTags(qlf.pos_vs_neg_NRxH), type="html")
|
|
|
|
And for the non-remitters VS remitters comparision:
kable(topTags(qlf.pos_vs_neg_NRxR), type="html")
|
|
|
|
In order to have a more general view of the differences in expression among non-remitters and controls and among remitters and non-remitters we decided to build an MA plot for each of these assessments. For that, we will use plotMA from the limma package(Ritchie et al., 2015). We will also use the
Comparing first pair of samples (remmitter Vs non-remmiters and its control)
plotMA(log2(SSRI_exp[,c(5,8)]), ylab="M - ratio log expression", main="MA plot R VS NR 1st pair", status = SSRI_exp[,c(2)])
Legend: Differential expression profile on the first pair of samples (remmitter Vs non-remmiters and its control)
Second pair:
plotMA(log2(SSRI_exp[,c(6,9)]), ylab="M - ratio log expression", main="MA plot R VS NR 2nd pair", status = SSRI_exp[,c(3)])
Legend: Differential expression profile on the second pair of samples (remmitter Vs non-remmiters and its control)
Third pair:
plotMA(log2(SSRI_exp[,c(7,10)]), ylab="M - ratio log expression", main="MA plot R VS NR 3rd pair", status = SSRI_exp[,c(4)])
Legend: Differential expression profile on the third pair of samples (remmitter Vs non-remmiters and its control)
No useful insight is prompt when such large number of gene mapping is available so we decided to highlight only the group of genes related to the paper’s biological hypothesis, concerning the following HTR family genes, which interestingly are more actively differentially expressed than the average:
plotMA(log2(SSRI_exp[7844:7866,c(5,8)]), ylab="M - ratio log expression", main="HTR family MA plot R VS NR 1st pair", status = SSRI_exp[,c(2)])
Legend: Differential expression profile on the first pair of samples (remmitter Vs non-remmiters and its control for the following HTR family members:
1)HTR1A 2)HTR1B 3)HTR1D 4)HTR1E 5)HTR1F 6)HTR2A
7)HTR2B 8)HTR2C 9)HTR3A 10)HTR3B 11)HTR3C 12)HTR3D 13)HTR3E
plotMA(log2(SSRI_exp[7844:7866,c(6,9)]), ylab="M - ratio log expression", main="HTR family MA plot R VS NR 2nd pair", status = SSRI_exp[,c(3)])
Legend: Differential expression profile on the second pair of samples (remmitter Vs non-remmiters and its control for the following HTR family members:
1)HTR1A 2)HTR1B 3)HTR1D 4)HTR1E 5)HTR1F 6)HTR2A
7)HTR2B 8)HTR2C 9)HTR3A 10)HTR3B 11)HTR3C 12)HTR3D 13)HTR3E
plotMA(log2(SSRI_exp[7844:7866,c(7,10)]), ylab="M - ratio log expression", main="HTR family MA plot R VS NR 3rd pair", status = SSRI_exp[,c(4)])
Legend: Differential expression profile on the third pair of samples (remmitter Vs non-remmiters and its control for the following HTR family members:
1)HTR1A 2)HTR1B 3)HTR1D 4)HTR1E 5)HTR1F 6)HTR2A
7)HTR2B 8)HTR2C 9)HTR3A 10)HTR3B 11)HTR3C 12)HTR3D 13)HTR3E
In order to visualize our mostly diferentially expressed genes we used a heat map as provided by the ComplexHeatmap package(Gu et al., 2016)), also using circlize package (Gu et al., 2014). However the processing time was long for the whole dataset using those tools.
# As inspired in Lecture 6
# Creating matrix
heatmap_matrix <- normalized_counts[,2:ncol(normalized_counts)]
rownames(heatmap_matrix) <- rownames(normalized_counts)
colnames(heatmap_matrix) <- colnames(normalized_counts[,2:ncol(normalized_counts)])
heatmap_matrix <- t(scale(t(heatmap_matrix)))
# Setting colours
if(min(heatmap_matrix) == 0){
heatmap_col = colorRamp2(c( 0, max(heatmap_matrix)),
c( "white", "red"))
} else {
heatmap_col = colorRamp2(c(min(heatmap_matrix), 0,
max(heatmap_matrix)), c("blue", "white", "red"))
}
# Creating heat map
current_heatmap <- Heatmap(as.matrix(heatmap_matrix),
show_row_dend = TRUE,show_column_dend = TRUE,
col=heatmap_col,show_column_names = TRUE,
show_row_names = FALSE,show_heatmap_legend = TRUE)
current_heatmap
So we decided to use the limma package(Ritchie et al., 2015) tools instead, using a threshold of 0.01 for significance for a cleaner view.
First for non-remitters VS controls:
# As inspired by lecture 6
heatmap_matrix <- normalized_counts[,1:6]
rownames(heatmap_matrix) <- rownames(normalized_counts)
colnames(heatmap_matrix) <- colnames(normalized_counts[,1:6])
heatmap_matrix <- t(scale(t(heatmap_matrix)))
top_hits <- output_hits_2$hgnc_symbol[output_hits_2$P.Value<0.01]
heatmap_matrix_tophits <- t(scale(t(heatmap_matrix[which(rownames(heatmap_matrix)
%in% top_hits),])))
# UNCOMMENT THIS TO HIDE CLUSTERING
# This is to specifically avoid clustering by condition
# heatmap_matrix_tophits<- heatmap_matrix_tophits[,
# c(grep(colnames(heatmap_matrix_tophits),pattern = "1"),
# grep(colnames(heatmap_matrix_tophits),pattern = "2"),
# grep(colnames(heatmap_matrix_tophits),pattern = "3"))]
if(min(heatmap_matrix_tophits) == 0){
heatmap_col = colorRamp2(c( 0, max(heatmap_matrix_tophits)),
c( "white", "red"))
} else {
heatmap_col = colorRamp2(c(min(heatmap_matrix_tophits), 0,
max(heatmap_matrix_tophits)),
c("blue", "white", "red"))
}
current_heatmap <- Heatmap(as.matrix(heatmap_matrix_tophits),
cluster_rows = TRUE, show_row_dend = TRUE,
cluster_columns = FALSE,show_column_dend = FALSE,
col=heatmap_col,show_column_names = TRUE,
show_row_names = FALSE,show_heatmap_legend = TRUE)
current_heatmap
Legend: The conditions here (status of non-remitter or control) are intentionally clustered together due to the order of the matrix (cluster argument still set to FALSE) used on the heat map (uncomment unclustering code to see it otherwise). One can notice that the bottom half of the over expressed sets on the non-remitter band is more active at remitters neurons 2 and 3 but still overexpressed across all test samples.
Now for non-remitters VS remitters:
# As inspired by lecture 6
heatmap_matrix <- normalized_counts[,4:9]
rownames(heatmap_matrix) <- rownames(normalized_counts)
colnames(heatmap_matrix) <- colnames(normalized_counts[,4:9])
heatmap_matrix <- t(scale(t(heatmap_matrix)))
top_hits <- output_hits_2$hgnc_symbol[output_hits_2$P.Value<0.01]
heatmap_matrix_tophits <- t(scale(t(heatmap_matrix[which(rownames(heatmap_matrix)
%in% top_hits),])))
# UNCOMMENT THIS TO HIDE CLUSTERING
# This is to specifically avoid clustering by condition
# heatmap_matrix_tophits<- heatmap_matrix_tophits[,
# c(grep(colnames(heatmap_matrix_tophits),pattern = "1"),
# grep(colnames(heatmap_matrix_tophits),pattern = "2"),
# grep(colnames(heatmap_matrix_tophits),pattern = "3"))]
if(min(heatmap_matrix_tophits) == 0){
heatmap_col = colorRamp2(c( 0, max(heatmap_matrix_tophits)),
c( "white", "red"))
} else {
heatmap_col = colorRamp2(c(min(heatmap_matrix_tophits), 0,
max(heatmap_matrix_tophits)),
c("blue", "white", "red"))
}
current_heatmap <- Heatmap(as.matrix(heatmap_matrix_tophits),
cluster_rows = TRUE, show_row_dend = TRUE,
cluster_columns = FALSE,show_column_dend = FALSE,
col=heatmap_col,show_column_names = TRUE,
show_row_names = FALSE,show_heatmap_legend = TRUE)
current_heatmap
The conditions here (status of remitter or non-remitter) are intentionally clustered together due to the order of the matrix used on the heat map (cluster argument still set to FALSE). One can notice that except for a mid-range band across all NR’s and except for the lower half of NR_neuron_1, the non-remitter are active than the correspondent band on the remitter neurons.
First comparing non-remitters and controls:
## up regulated genes:
length(which(qlf_output_hits_RNxH$table$PValue < 0.05
& qlf_output_hits_RNxH$table$logFC > 0))
## [1] 1345
## down regulated genes:
length(which(qlf_output_hits_RNxH$table$PValue < 0.05
& qlf_output_hits_RNxH$table$logFC < 0))
## [1] 961
We have 1345 genes significantly up-regulated in non-remitters comparing to controls, and 961 down-regulated.
Now comparing non-remitters and remitters:
## up regulated genes:
length(which(qlf_output_hits_RNxR$table$PValue < 0.05
& qlf_output_hits_RNxR$table$logFC > 0))
## [1] 379
## down regulated genes:
length(which(qlf_output_hits_RNxR$table$PValue < 0.05
& qlf_output_hits_RNxR$table$logFC < 0))
## [1] 353
We have 379 genes significantly up-regulated in non-remitters comparing to controls, and 353 down-regulated, which is compatible with finding on paper, where the delta of expression among MDD patients is decreased when compared to remitters and controls.
Here we will create a thresholded list, initially with the remitters dataset compared to the non remitters dataset, in which our rank will consist in the following product: -log(p-value) * logFC. We then create 3 files to contain, separately:
# Inspired by lecture 7
# Attached name to gene results matrix
qlf_output_hits_withgn_RNxR <- merge(SSRI_exp[,1], qlf_output_hits_RNxR, by.x=1, by.y=0)
colnames(qlf_output_hits_withgn_RNxR) <- c("gname", colnames(qlf_output_hits_withgn_RNxR)[-1])
# Calculating our rank as -log(p-value) * logFC
qlf_output_hits_withgn_RNxR[,"rank"] <- -log(qlf_output_hits_withgn_RNxR$PValue,base=10) *
sign(qlf_output_hits_withgn_RNxR$logFC)
# Ordering per rank
qlf_output_hits_withgn_RNxR <- qlf_output_hits_withgn_RNxR[order(qlf_output_hits_withgn_RNxR$rank),]
# Separating up-regulated genes with threshold of 0.05
# Note: the previous calculation modeled for FC R/NR
# here, we are interested in FC NR/R
upregulated_genes_RNxR <- qlf_output_hits_withgn_RNxR$gname[which(qlf_output_hits_withgn_RNxR$PValue < 0.05
& qlf_output_hits_withgn_RNxR$logFC < 0)]
# Separating down-regulated genes with threshold of 0.05
downregulated_genes_RNxR <- qlf_output_hits_withgn_RNxR$gname[which(qlf_output_hits_withgn_RNxR$PValue
< 0.05 & qlf_output_hits_withgn_RNxR$logFC > 0)]
Now doing the same to compare non-remitters to remitters:
# Inspired by lecture 7
# Attached name to gene results matrix
qlf_output_hits_withgn_RNxH <- merge(SSRI_exp[,1], qlf_output_hits_RNxH, by.x=1, by.y=0)
colnames(qlf_output_hits_withgn_RNxH) <- c("gname", colnames(qlf_output_hits_withgn_RNxH)[-1])
# Calculating our rank as -log(p-value) * logFC
qlf_output_hits_withgn_RNxH[,"rank"] <- -log(qlf_output_hits_withgn_RNxH$PValue,base=10) *
sign(qlf_output_hits_withgn_RNxH$logFC)
# Ordering per rank
qlf_output_hits_withgn_RNxH <- qlf_output_hits_withgn_RNxH[order(qlf_output_hits_withgn_RNxH$rank),]
# Separating up-regulated genes with threshold of 0.05
upregulated_genes_RNxH <- qlf_output_hits_withgn_RNxH$gname[which(qlf_output_hits_withgn_RNxH$PValue < 0.05
& qlf_output_hits_withgn_RNxH$logFC > 0)]
# Separating down-regulated genes with threshold of 0.05
downregulated_genes_RNxH <- qlf_output_hits_withgn_RNxH$gname[which(qlf_output_hits_withgn_RNxH$PValue
< 0.05 & qlf_output_hits_withgn_RNxH$logFC < 0)]
Save tables for later use if docker port connection is properly configured:
dir.create('output', showWarnings=FALSE)
# Saving up-regulated genes NRxR
write.table(x=upregulated_genes_RNxR, file=file.path("./","RNxR_upregulated_genes.txt"),sep="\t",
row.names = FALSE,col.names = FALSE,quote = FALSE)
# Saving down-regulated genes NRxR
write.table(x=downregulated_genes_RNxR, file=file.path("./","RNxR_downregulated_genes.txt"),sep="\t",
row.names = FALSE,col.names = FALSE,quote = FALSE)
# Saving complete list NRxR
write.table(x=data.frame(genename=qlf_output_hits_withgn_RNxR$gname,
F_stat=qlf_output_hits_withgn_RNxR$rank
),
file=file.path("./","RNxR_ranked_genelist.txt"),
sep = "\t",
row.names = FALSE,col.names = FALSE,quote = FALSE)
# Saving up-regulated genes RNxH
write.table(x=upregulated_genes_RNxH, file=file.path("./","RNxH_upregulated_genes.txt"),sep="\t",
row.names = FALSE,col.names = FALSE,quote = FALSE)
# Saving down-regulated genes RNxH
write.table(x=downregulated_genes_RNxH, file=file.path("./","RNxH_downregulated_genes.txt"),sep="\t",
row.names = FALSE,col.names = FALSE,quote = FALSE)
# Saving complete list RNxH
write.table(x=data.frame(genename=qlf_output_hits_withgn_RNxH$gname,
F_stat=qlf_output_hits_withgn_RNxH$rank
),
file=file.path("./","RNxH_ranked_genelist.txt"),
sep = "\t",
row.names = FALSE,col.names = FALSE,quote = FALSE)
We will be using G:Profiler web server(Uku Raudvere, 2019)to perform the above gene list enrichment analysis (version e102_eg49_p15_7a9b4d6).
Thus we will be using the following annotation database in the enrichment:
All electronic GO annotations were discarded and the annotation term size was restricted to 200 genes.
The tool uses Fisher Exact test. We used 0.05 threshold for significance. In our list of up and down regulated genes we could filter for only 4 members of HTR gene family, since we had a strict standard to cut out any mapped genes with less than 9 reads. Thus we have now decided to use Benjamni-Hochberg correction for multiple hypothesis testing due to its less stringent profile compared to Bonferoni correction.
Fig 1 - Legend: ORA results non-remitters VS controls in up-regulated genes
Fig 2 - Legend: ORA with Wiki Pathway results for non-remitters VS controls in up-regulated genes, here we notice noticed all 31 results over the 0.05 threshold and terms of maximum size of 200.
Fig 3 - Legend: ORA with Reactome results for non-remitters VS controls in up-regulated genes, here we notice the most significant results of a total of 89 results over the 0.05 threshold with terms of maximum size of 200.
Fig 4 - Legend: ORA with GO results for non-remitters VS controls in up-regulated genes, here we notice noticed the 35 most significant entries of a total of 467 results over the 0.05 threshold and terms of maximum size of 200.
Fig 5 - Legend: ORA results non-remitters VS controls in down-regulated genes
Fig 6 - Legend: ORA with Wiki Pathway results for non-remitters VS controls in down-regulated genes, here we notice noticed all 26 results over the 0.05 threshold and terms of maximum size of 200.
Fig 7 - Legend: ORA with Reactome results for non-remitters VS controls in down-regulated genes, here we notice the most significant results of a total of 139 results over the 0.05 threshold with terms of maximum size of 200.
Fig 8 - Legend: ORA with GO results for non-remitters VS controls in down-regulated genes, here we notice noticed the most significant entries of a total of 89 results over the 0.05 threshold and terms of maximum size of 200.
Fig 9 - Legend: ORA results non-remitters VS controls in all-regulated genes
Fig 10 - Legend: ORA with Wiki Pathway results for non-remitters VS controls in all genes, here we notice noticed the most significant of the 175 results over the 0.05 threshold and terms of maximum size of 200.
Fig 11 - Legend: ORA with Reactome results for non-remitters VS controls in all genes, here we notice the most significant results of a total of 601 results over the 0.05 threshold with terms of maximum size of 200.
Fig 12 - Legend: ORA with GO results for non-remitters VS controls in all genes, here we notice noticed the most significant entries of a total of 962 results over the 0.05 threshold and terms of maximum size of 200.
Fig 13, ORA results non-remitters VS remitters in up-regulated genes
Fig 13 - Legend: ORA results non-remitters VS remitters in up-regulated genes
Fig 14 ORA with Wiki Pathways results for non-remitters VS remitters in up-regulated genes
Fig 14 - Legend: ORA with Wiki Pathway results for non-remitters VS remitters in up-regulated genes, here we notice noticed all 12 results over the 0.05 threshold and terms of maximum size of 200.
Fig 15 ORA with reactome results for non-remitters VS remitters in up-regulated genes
Fig 15 - Legend: ORA with Reactome results for non-remitters VS remitters in up-regulated genes, here we notice all 28 results over the 0.05 threshold with terms of maximum size of 200.
Fig 16 ORA with GO results for non-remitters VS remitters in up-regulated genes
Fig 16 - Legend: ORA with GO results for non-remitters VS remitters in up-regulated genes, here we notice noticed the most significant entries of a total of 110 results over the 0.05 threshold and terms of maximum size of 200.
Fig 17 - Legend: ORA results non-remitters VS remitters in down-regulated genes
Fig 18 - Legend: ORA with Wiki Pathway results for non-remitters VS remitters in down-regulated genes, here we notice noticed all 3 results over the 0.05 threshold and terms of maximum size of 200.
Fig 19 - Legend: ORA with Reactome results for non-remitters VS remitters in down-regulated genes, here we notice no result passed the 0.05 threshold with terms of maximum size of 200.
Fig 20 - Legend: ORA with GO results for non-remitters VS remitters in down-regulated genes, here we notice noticed all the 9 results over the 0.05 threshold and terms of maximum size of 200.
Fig 13 - Legend: ORA results non-remitters VS remitters in all-regulated genes
Fig 14 - Legend: ORA with Wiki Pathway results for non-remitters VS remitters in all genes, here we notice noticed the most significant of the 123 results over the 0.05 threshold and terms of maximum size of 200.
Fig 15 - Legend: ORA with Reactome results for non-remitters VS remitters in all genes, here we notice the most significant results of a total of 626 results over the 0.05 threshold with terms of maximum size of 200.
Fig 16 - Legend: ORA with GO results for non-remitters VS remitters in all genes, here we notice noticed the most significant entries of a total of 981 results over the 0.05 threshold and terms of maximum size of 200.
The forebrain neurons used in the study, as discussed in A1, were developed through a dedifferentiation technique which induce the formation pluripotent stem cells (iPSCs) from skin fibroblast cells. That technique is related to up and down regulation of several groups of genes that will be reflected on the results at section 3. Many of the results, especially from GO annotation correlates with literature on the dedifferentiation technique (Soliman MA, 2017)(Brennand KJ, 2011).
Even after taking that into consideration, the number of possible functional pathways elicited by the enrichment analysis revealed, especially when using reactome and Wiki Pathways annotations, that the number of over represented sets on the comparison of non-remitters and controls is much higher than at the comparison of non-remitters and remitters, which is also compatible with the source study(Vadodaria et al., 2019), but also in literature comparing pharmacological-response phenotype subgroups among those affected by MDD (Mrazek DA, 2014)(Drysdale AT, 2017).
Important to notice is the over-representation of eye development pathways in neurons exposed to SSRI, a finding that strongly correlate to 5-HT role in retinal process(George et al., 2005), and should therefore impair the viability of hypothesis based on its differential expression as markers of MDD itself.
The most interesting finding from the current thresholded enrichment analysis, even after pondering over the limitations above, is the presence of over-represented sets on opiate receptor family and monoamine transport pathways, the later being confirmed by the source literature(Vadodaria et al., 2019). The role of endogenous opiate in SSRI-resistant MDD patients may be, at least preliminary, good targets for further consideration, which I hope to undertake during A3 and its ranked approach to enrichment analysis.
Using the main paper’s modeling (please see section 2.1.3), 2306 were differentially expressed for the non-remitters VS controls analysis and 732 for the remitters VS non-remitters analysis. For the other models please see section 2.1.1 and 2.1.2.
The threshold used was 0.05 due to its widely accepted marker of significance although an analysis with 0.09 for the paper’s model was also used in section 2.1.3 due to the fact that the source paper uses it for one of it tests (please see details at 2.1.3)
Please see section 2 for details.
We used Benjamni-Hochberg correction for multiple testing.
Reason: In A1, we could filter for only 4 members of HTR gene family (which is relevant for the main paper), since we had a strict standard to cut out any mapped genes with less than 9 reads. Thus we have now decided to use Benjamni-Hochberg correction for multiple hypothesis testing due to its less stringent profile compared to Bonferoni correction.
No gene passed the correction, prompting for the use of functional enrichment analysis.
Please see section 2.2
Please see section 2.3. Conditions intentionally cluster together due to the order of processing, despite cluster argument being set to FALSE. (uncomment related chunk to uncluster, please see details on section 2.3)
Please see section 3.1 and 3.2 for details.
We used Fisher Exact test on over-representation analysis through the G:Profiler web server(Uku Raudvere, 2019) (version e102_eg49_p15_7a9b4d6). We used this method for being appropriate to our thresholded list and easier to automate since it is built in at the G:Profiler tool for functional enrichment analysis.
We use Benjamni-Hochberg correction for multiple hypothesis testing due to its less stringent profile compared to Bonferoni correction
Please see section 3.3 for details.
Please see section 3.3 for details.
ORA with Wiki Pathway results for non-remitters VS controls in up-regulated genes:
ORA with Reactome results for non-remitters VS controls in up-regulated genes:
ORA with GO results for non-remitters VS controls in up-regulated genes:
ORA with Wiki Pathway results for non-remitters VS controls in down-regulated genes:
ORA with Reactome results for non-remitters VS controls in down-regulated genes:
ORA with GO results for non-remitters VS controls in down-regulated genes:
ORA with Wiki Pathway results for non-remitters VS controls in all genes:
ORA with Reactome results for non-remitters VS controls in all genes:
ORA with GO results for non-remitters VS controls in all genes:
ORA with Wiki Pathway results for non-remitters VS remitters in up-regulated genes:
ORA with Reactome results for non-remitters VS remitters in up-regulated genes:
OORA with GO results for non-remitters VS remitters in up-regulated genes:
ORA with Wiki Pathway results for non-remitters VS remitters in down-regulated genes:
ORA with Reactome results for non-remitters VS remitters in down-regulated genes:
ORA with GO results for non-remitters VS remitters in down-regulated genes:
ORA with Wiki Pathway results for non-remitters VS remitters in all genes:
ORA with Reactome results for non-remitters VS remitters in all genes:
ORA with GO results for non-remitters VS remitters in all genes:
The results on up-regulated and down-regulated were much fewer but much more significant the the ones yielded by ORA on the whole list, compatible with what was expected. Please see section 4 for details.
Yes, please see details in section 4.
Brennand KJ, et a. (2011). Modelling schizophrenia using human induced pluripotent stem cells. Nature, 473, 221–225.
Davis, S., & Meltzer, P. (2007). GEOquery: A bridge between the gene expression omnibus (geo) and bioconductor. Bioinformatics, 14, 1846–1847.
Drysdale AT, et a. (2017). Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat Med., 23, 28–38.
Gene expression omnibus. (n.d.). In National Center for Biotechnology Information. U.S. National Library of Medicine. https://www.ncbi.nlm.nih.gov/geo/
George, A., Schmid, K. L., & Pow, D. V. (2005). Retinal serotonin, eye growth and myopia development in chick. Experimental Eye Research, 81(5), 616–625.
Gu, Z., Eils, R., & Schlesner, M. (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics.
Gu, Z., Gu, L., Eils, R., Schlesner, M., & Brors, B. (2014). Circlize implements and enhances circular visualization in r. Bioinformatics, 30(19), 2811–2812.
Morgan, M. (2019). BiocManager: Access the bioconductor project package repository. https://CRAN.R-project.org/package=BiocManager
Mrazek DA, et a. (2014). Treatment outcomes of depression: The pharmacogenomic research network antidepressant medication pharmacogenomic study. J Clin Psychopharmacol., 34, 313–317.
Müller, K., Wickham, H., James, D. A., & Falcon, S. (2020). RSQLite: ’SQLite’ interface for r.
Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. https://doi.org/10.1093/nar/gkv007
Soliman MA, Z. N., Aboharb F. (2017). Pluripotent stem cells in neuropsychiatric disorders. Mol Psychiatry, 22, 1241–1249.
Uku Raudvere, I. K., Liis Kolberg. (2019). G:Profiler: A web server for functional enrichment analysis and conversions of gene lists.
Vadodaria, K. C., Ji, Y., Skime, M., Paquola, A., Nelson, T., Hall-Flavin, D., Fredlender, C., Heard, K. J., Deng, Y., Le, A. T., & others. (2019). Serotonin-induced hyperactivity in ssri-resistant major depressive disorder patient-derived neurons. Molecular Psychiatry, 24(6), 795–807.
Vadodaria KC, G. F. (2019). Serotonin-induced hyperactivity in ssri-resistant major depressive disorder patient-derived neurons. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125664
Xie, Y. (2021). Knitr: A general-purpose package for dynamic report generation in r. https://yihui.org/knitr/
Zhu, Y., Davis, S., Stephens, R., Meltzer, P. S., & Chen, Y. (2008). GEOmetadb: Powerful alternative search engine for the gene expression omnibus. Bioinformatics (Oxford, England), 24(23), 2798–2800. https://doi.org/10.1093/bioinformatics/btn520